AITopics | Bắc Ninh

Collaborating Authors

Bắc Ninh

BERT-based model for Vietnamese Fact Verification Dataset

Tran, Bao, Khanh, T. N., Tuong, Khang Nguyen, Dang, Thien, Nguyen, Quang, Thinh, Nguyen T., Hung, Vo T.

arXiv.org Artificial IntelligenceMar-1-2025

The rapid advancement of information and communication technology has facilitated easier access to information. However, this progress has also necessitated more stringent verification measures to ensure the accuracy of information, particularly within the context of Vietnam. This paper introduces an approach to address the challenges of Fact Verification using the Vietnamese dataset by integrating both sentence selection and classification modules into a unified network architecture. The proposed approach leverages the power of large language models by utilizing pre-trained PhoBERT and XLM-RoBERTa as the backbone of the network. The proposed model was trained on a Vietnamese dataset, named ISE-DSC01, and demonstrated superior performance compared to the baseline model across all three metrics. Notably, we achieved a Strict Accuracy level of 75.11\%, indicating a remarkable 28.83\% improvement over the baseline model.

dataset, fact verification, verification, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-74127-2_19

2503.00356

Country:

Asia > Vietnam > Hanoi > Hanoi (0.14)
Asia > Vietnam > Bắc Ninh Province > Bắc Ninh (0.05)
Asia > Vietnam > Hồ Chí Minh City > Hồ Chí Minh City (0.05)
(4 more...)

Genre: Research Report (0.85)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Using Large Language Models for education managements in Vietnamese with low resources

Minh, Duc Do, Van, Vinh Nguyen, Cong, Thang Dam

arXiv.org Artificial IntelligenceJan-24-2025

Large language models (LLMs), such as GPT-4, Gemini 1.5, Claude 3.5 Sonnet, and Llama3, have demonstrated significant advancements in various NLP tasks since the release of ChatGPT in 2022. Despite their success, fine-tuning and deploying LLMs remain computationally expensive, especially in resource-constrained environments. In this paper, we proposed VietEduFrame, a framework specifically designed to apply LLMs to educational management tasks in Vietnamese institutions. Our key contribution includes the development of a tailored dataset, derived from student education documents at Hanoi VNU, which addresses the unique challenges faced by educational systems with limited resources. Through extensive experiments, we show that our approach outperforms existing methods in terms of accuracy and efficiency, offering a promising solution for improving educational management in under-resourced environments. While our framework leverages synthetic data to supplement real-world examples, we discuss potential limitations regarding broader applicability and robustness in future implementations.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2501.15022

Country:

Asia > Vietnam > Hanoi > Hanoi (0.34)
Asia > Vietnam > Bắc Ninh Province > Bắc Ninh (0.04)
South America > Uruguay > Maldonado > Maldonado (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Education > Educational Setting (1.00)
Education > Curriculum > Subject-Specific Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multi-Dialect Vietnamese: Task, Dataset, Baseline Models and Challenges

Van Dinh, Nguyen, Dang, Thanh Chi, Nguyen, Luan Thanh, Van Nguyen, Kiet

arXiv.org Artificial IntelligenceOct-4-2024

Vietnamese, a low-resource language, is typically categorized into three primary dialect groups that belong to Northern, Central, and Southern Vietnam. However, each province within these regions exhibits its own distinct pronunciation variations. Despite the existence of various speech recognition datasets, none of them has provided a fine-grained classification of the 63 dialects specific to individual provinces of Vietnam. To address this gap, we introduce Vietnamese Multi-Dialect (ViMD) dataset, a novel comprehensive dataset capturing the rich diversity of 63 provincial dialects spoken across Vietnam. Our dataset comprises 102.56 hours of audio, consisting of approximately 19,000 utterances, and the associated transcripts contain over 1.2 million words. To provide benchmarks and simultaneously demonstrate the challenges of our dataset, we fine-tune state-of-the-art pre-trained models for two downstream tasks: (1) Dialect identification and (2) Speech recognition. The empirical results suggest two implications including the influence of geographical factors on dialects, and the constraints of current approaches in speech recognition tasks involving multi-dialect speech data. Our dataset is available for research purposes.

dataset, dialect, experiment, (17 more...)

arXiv.org Artificial Intelligence

2410.03458

Country:

Asia > Vietnam > Hanoi > Hanoi (0.14)
Asia > Vietnam > Thanh Hóa Province > Thanh Hóa (0.04)
Asia > Vietnam > Hưng Yên Province > Hưng Yên (0.04)
(65 more...)

Genre: Research Report > New Finding (0.66)

Industry: Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

VlogQA: Task, Dataset, and Baseline Models for Vietnamese Spoken-Based Machine Reading Comprehension

Ngo, Thinh Phuoc, Dang, Khoa Tran Anh, Luu, Son T., Van Nguyen, Kiet, Nguyen, Ngan Luu-Thuy

arXiv.org Artificial IntelligenceFeb-4-2024

This paper presents the development process of a Vietnamese spoken language corpus for machine reading comprehension (MRC) tasks and provides insights into the challenges and opportunities associated with using real-world data for machine reading comprehension tasks. The existing MRC corpora in Vietnamese mainly focus on formal written documents such as Wikipedia articles, online newspapers, or textbooks. In contrast, the VlogQA consists of 10,076 question-answer pairs based on 1,230 transcript documents sourced from YouTube -- an extensive source of user-uploaded content, covering the topics of food and travel. By capturing the spoken language of native Vietnamese speakers in natural settings, an obscure corner overlooked in Vietnamese research, the corpus provides a valuable resource for future research in reading comprehension tasks for the Vietnamese language. Regarding performance evaluation, our deep-learning models achieved the highest F1 score of 75.34% on the test set, indicating significant progress in machine reading comprehension for Vietnamese spoken language data. In terms of EM, the highest score we accomplished is 53.97%, which reflects the challenge in processing spoken-based content and highlights the need for further improvement.

corpus, dataset, transcript, (16 more...)

arXiv.org Artificial Intelligence

2402.02655

Country:

Asia > Vietnam > Hanoi > Hanoi (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Philippines > Luzon > National Capital Region > City of Manila (0.14)
(7 more...)

Genre: Research Report (0.40)

Industry:

Education > Assessment & Standards > Student Performance (1.00)
Media > News (0.86)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Vietnam's CMC strives for IoT with Samsung's commitment

#artificialintelligenceSep-21-2019, 18:51:45 GMT

A unit of South Korea's Samsung Electronics has bought a 30% stake for more than $40 million in Vietnam's second-largest IT company CMC Corp. which hopes to use most of those proceeds to focus on developing the "internet of things" and artificial intelligence technologies. CMC hopes this expanded partnership with Samsung, which has a global reach, will help to double its overseas sales to more than 30% of its total by 2023. CMC has been making computer systems and services related to internet of things for Samsung since 2016. Samsung has now completed the acquisition of a 25% stake in new CMC shares and the other 5% by buying on the Ho Chi Minh Stock Exchange. Chairman and CEO Nguyen Trung Chinh said this commitment from Samsung will propel CMC to becoming a global company in the next five years. Samsung Electronics' systems development arm, Samsung SDS, had said late July that it would buy the 25% stake for about 4 billion yen ($38 million).

cmc, samsung, vietnam, (16 more...)

#artificialintelligence

Country:

Asia > South Korea (0.39)
North America > United States (0.05)
Europe (0.05)
(7 more...)

Industry:

Semiconductors & Electronics (1.00)
Banking & Finance > Trading (0.52)
Information Technology > Security & Privacy (0.33)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Networks (0.52)

Add feedback